High Order LSTM/GRU

نویسنده

  • Wenjie Luo
چکیده

RNN is a powerful model for sequence data but suffers from gradient vanishing and explosion, thus difficult to be trained to capture long range dependences. But people have proposed LSTM and GRU, which try to model the differences between adjacent data frame rather than the data frame itself. By doing so, it allows the error to back propagate throw longer time without vanishing. Also instead of simply accumulating information, these models introduce multiple gating functions which depend on current input and hidden states to control whether information should be taken-in, passed though or discarded. Currently all the gating functions are sigmoid/tanh function over a linear combination of input x and current hidden states h. In this work, we are trying to introduce a high order term for RNN as a component of gating function. By doing so, we argue that it can better model the relation between current input and hidden states, thus better control the flow of information and learn better representation accordingly.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

پیش‌بینی میزان دبی متوسط ماهیانۀ رودخانۀ کارون با استفاده از روش ترکیبی GRU-LSTM

مدل سازی دبی رودخانه در مدیریت منابع آب و مدیریت ریسک از اهمیت بالایی برخوردار است. این امر در مناطق کوهستانی اهمیت بیشتری پیدا می‌کند زیرا بیشتر جمعیت‌های پایین‌دست منطقه، وابستگی زیادی به کشاورزی و فعالیت‌های تجاری مانند تولید برق دارند. در این زمینه‌، در سال‌های اخیر، مدل‌های یادگیری ماشینی به دلیل دقت بالا در پیش‌بینی از طریق یادگیری به-صورت جعبه سیاه مورد توجه زیادی قرار گرفته‌اند. از این ...

متن کامل

LSTM, GRU, Highway and a Bit of Attention: An Empirical Overview for Language Modeling in Speech Recognition

Popularized by the long short-term memory (LSTM), multiplicative gates have become a standard means to design artificial neural networks with intentionally organized information flow. Notable examples of such architectures include gated recurrent units (GRU) and highway networks. In this work, we first focus on the evaluation of each of the classical gated architectures for language modeling fo...

متن کامل

Internal Memory Gate for Recurrent Neural Networks with Application to Spoken Language Understanding

Long Short-Term Memory (LSTM) Recurrent Neural Networks (RNN) require 4 gates to learn shortand long-term dependencies for a given sequence of basic elements. Recently, “Gated Recurrent Unit” (GRU) has been introduced and requires fewer gates than LSTM (reset and update gates), to code shortand long-term dependencies and reaches equivalent performances to LSTM, with less processing time during ...

متن کامل

Efficiently applying attention to sequential data with the Recurrent Discounted Attention unit

Recurrent Neural Networks architectures excel at processing sequences by modelling dependencies over different timescales. The recently introduced Recurrent Weighted Average (RWA) unit captures long term dependencies far better than an LSTM on several challenging tasks. The RWA achieves this by applying attention to each input and computing a weighted average over the full history of its comput...

متن کامل

Recurrent Discounted Attention

Recurrent Neural Networks architectures excel at processing sequences by modelling dependencies over different timescales. The recently introduced Recurrent Weighted Average (RWA) unit captures long term dependencies far better than an LSTM on several challenging tasks. The RWA achieves this by applying attention to each input and computing a weighted average over the full history of its comput...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016